2 Data
2.1 2.1 Technical Description
In this project, we utilize three distinct multimodal datasets to explore physiological stress responses across different contexts: laboratory settings, office environments, and real-world driving scenarios. All datasets were collected using wearable sensors (primarily Empatica E4 and RespiBAN) and are sourced from official academic repositories.
2.1.1 1. WESAD (Wearable Stress and Affect Detection)
- Source: UCI Machine Learning Repository
- Context: Controlled laboratory environment.
- Participants: 15 subjects (S2–S17).
- Data Collection: Physiological signals were recorded using a chest-worn RespiBAN device (700Hz) and a wrist-worn Empatica E4 (32Hz).
- Variables Used: We focus on Electrocardiogram (ECG), Electrodermal Activity (EDA), Body Temperature (Temp), and Respiration (Resp).
- Preprocessing: The raw data, originally in Python Pickle (
.pkl) format, was accessed using Python scripts. We extracted chest sensor data, filtered for specific conditions (Baseline, Stress, Amusement, Meditation), and downsampled the signals to 10Hz to facilitate efficient visualization in R.
2.1.2 2. SWELL-KW (Smart Knowledge Work)
- Source: DANS / Radboud University
- Context: Office work environment mimicking “knowledge workers.”
- Participants: 25 subjects.
- Conditions: Subjects performed typical office tasks (writing reports, reading emails) under varying conditions: Neutral (no stress), Email Interruptions, and Time Pressure.
- Preprocessing: We utilized the “Behavioral features per minute” dataset. The original Excel file was converted to CSV format using Python to ensure compatibility and consistent column naming.
2.1.3 3. AffectiveROAD (Driving Stress)
- Source: Affective Computing Group / MIT Media Lab
- Context: Real-world driving tasks.
- Participants: Drivers wearing Empatica E4 sensors.
- Conditions: The dataset captures physiological responses during city and highway driving.
- Preprocessing: The raw data was distributed across multiple nested folders and ZIP archives. A custom Python script was developed to iterate through the directory structure, extract
HR.csv(Heart Rate) files directly from compressed archives (Left.ziporRight.zip), and aggregate them into a single dataset for analysis.
2.2 2.2 Missing Value Analysis
To ensure data quality, we performed a missing value analysis on the preprocessed datasets. Since the raw sensor data were continuous streams, our preprocessing pipeline (Python scripts) was designed to handle and exclude corrupted segments before exporting to CSV.
We load the cleaned datasets and visualize any remaining missing values using the naniar package.
2.2.1 2.2.1 WESAD Dataset Overview
The WESAD dataset is our primary source. As shown below, the dataset is complete with no missing values in the selected physiological channels, verifying the effectiveness of our extraction script.
2.2.2 2.2.2 Comparative Datasets (SWELL & AffectiveROAD)
We also inspect the supplementary datasets. The SWELL dataset (aggregated features) and AffectiveROAD (raw heart rate series) are also free of significant missing data issues, making them suitable for direct comparison.
Show Code
# Plot for AffectiveROAD
p1 <- gg_miss_var(road_df) +
labs(title = "Missing Values: AffectiveROAD",
subtitle = "Driving Heart Rate Data",
y = "Missing Count") +
theme_minimal()
# Plot for SWELL
p2 <- gg_miss_var(swell_df) +
labs(title = "Missing Values: SWELL-KW",
subtitle = "Office Stress Features",
y = "Missing Count") +
theme_minimal()
# Arrange plots side by side
grid.arrange(p1, p2, ncol = 2)